AITopics | future-dependent value function

We study off-policy evaluation (OPE) in partially observable environments with complex observations, with the goal of developing estimators whose guarantee avoids exponential dependence on the horizon. While such estimators exist for MDPs and POMDPs can be converted to history-based MDPs, their estimation errors depend on the state-density ratio for MDPs which becomes history ratios after conversion, an exponential object. Recently, Uehara et al. [2022a] proposed future-dependent value functions as a promising framework to address this issue, where the guarantee for memoryless policies depends on the density ratio over the latent state space. However, it also depends on the boundedness of the future-dependent value function and other related quantities, which we show could be exponential-in-length and thus erasing the advantage of the method. In this paper, we discover novel coverage assumptions tailored to the structure of POMDPs, such as outcome coverage and belief coverage, which enable polynomial bounds on the aforementioned quantities. As a side product, our analyses also lead to the discovery of new algorithms with complementary properties.

artificial intelligence, machine learning, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.63)

Add feedback

3380e8116452e0efbf36f35d95e88c94-Paper-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 22:29:47 GMT

arxiv preprint arxiv, evaluation, future-dependent value function, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Illinois (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre:

Research Report (0.67)
Instructional Material (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

03d7e13f0092405804f3a381ade8f3f0-Supplemental-Conference.pdf

Neural Information Processing SystemsDec-27-2025, 15:54:16 GMT

complexity, future-dependent value function, pomdp, (13 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.31)

Add feedback

Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems

Neural Information Processing SystemsDec-27-2025, 15:54:12 GMT

We study Reinforcement Learning for partially observable dynamical systems using function approximation. We propose a new Partially Observable Bilinear Actor-Critic framework, that is general enough to include models such as observable tabular Partially Observable Markov Decision Processes (POMDPs), observable Linear-Quadratic-Gaussian (LQG), Predictive State Representations (PSRs), as well as a newly introduced model Hilbert Space Embeddings of POMDPs and observable POMDPs with latent low-rank transition.

arxiv preprint arxiv, complexity, pomdp, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

Neural Information Processing SystemsDec-24-2025, 12:47:49 GMT

We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general function approximation. Existing methods such as sequential importance sampling estimators and fitted-Q evaluation suffer from the curse of horizon in POMDPs. To circumvent this problem, we develop a novel model-free OPE method by introducing future-dependent value functions that take future proxies as inputs. Future-dependent value functions play similar roles as classical value functions in fully-observable MDPs. We derive a new off-policy Bellman equation for future-dependent value functions as conditional moment equations that use history proxies as instrumental variables. We further propose a minimax learning method to learn future-dependent value functions using the new Bellman equation. We obtain the PAC result, which implies our OPE estimator is close to the true policy value as long as futures and histories contain sufficient information about latent states, and the Bellman completeness.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

Neural Information Processing SystemsOct-8-2025, 10:27:49 GMT

Existing methods such as sequential importance sampling estimators suffer from the curse of horizon in POMDPs.

arxiv preprint arxiv, evaluation, future-dependent value function, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Illinois (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre:

Research Report (0.67)
Instructional Material (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

On the Curses of Future and History in Future-dependent Value Functions for Off-policy Evaluation

Neural Information Processing SystemsMay-27-2025, 19:41:18 GMT

We study off-policy evaluation (OPE) in partially observable environments with complex observations, with the goal of developing estimators whose guarantee avoids exponential dependence on the horizon. While such estimators exist for MDPs and POMDPs can be converted to history-based MDPs, their estimation errors depend on the state-density ratio for MDPs which becomes history ratios after conversion, an exponential object. Recently, Uehara et al. [2022a] proposed future-dependent value functions as a promising framework to address this issue, where the guarantee for memoryless policies depends on the density ratio over the latent state space. However, it also depends on the boundedness of the future-dependent value function and other related quantities, which we show could be exponential-in-length and thus erasing the advantage of the method. In this paper, we discover novel coverage assumptions tailored to the structure of POMDPs, such as outcome coverage and belief coverage, which enable polynomial bounds on the aforementioned quantities.

future and history, future-dependent value function, off-policy evaluation, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.69)

Add feedback

Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

Neural Information Processing SystemsOct-11-2024, 02:10:03 GMT

We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general function approximation. Existing methods such as sequential importance sampling estimators and fitted-Q evaluation suffer from the curse of horizon in POMDPs. To circumvent this problem, we develop a novel model-free OPE method by introducing future-dependent value functions that take future proxies as inputs. Future-dependent value functions play similar roles as classical value functions in fully-observable MDPs. We derive a new off-policy Bellman equation for future-dependent value functions as conditional moment equations that use history proxies as instrumental variables.

future-dependent value function, future-dependent value-based off-policy evaluation, pomdp, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)

Add feedback

Filters

Collaborating Authors

future-dependent value function

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

03d7e13f0092405804f3a381ade8f3f0-Supplemental-Conference.pdf

03d7e13f0092405804f3a381ade8f3f0-Paper-Conference.pdf

On the Curses of Future and History in Future-dependent Value Functions for Off-policy Evaluation

3380e8116452e0efbf36f35d95e88c94-Paper-Conference.pdf

03d7e13f0092405804f3a381ade8f3f0-Supplemental-Conference.pdf

Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems

Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

On the Curses of Future and History in Future-dependent Value Functions for Off-policy Evaluation

Future-Dependent Value-Based Off-Policy Evaluation in POMDPs